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Abstract This paper introduces and explores a class of polynomial precondition¬ 
ers for the Helmholtz equation, denoted as expansion preconditioners EX(m), that 
form a direct generalization to the classical complex shifted Laplace (CSL) pre¬ 
conditioner. The construction of the EX (m) preconditioner is based on a truncated 
Taylor series expansion of the original Helmholtz operator inverse. The expansion 
preconditioner is shown to significantly improve Krylov solver convergence rates 
for growing values of the number of series terms m. However, the addition of mul¬ 
tiple terms in the expansion also increases the computational cost of applying the 
preconditioner. A thorough cost-benefit analysis of the addition of extra terms in the 
EX (in) preconditioner proves that the CSL or EX (1) preconditioner is the most effi¬ 
cient member of the expansion preconditioner class for general practical and solver 
problem settings. Additionally, possible extensions to the expansion preconditioner 
class that further increase preconditioner efficiency are suggested, and numerical 
experiments in ID and 2D are presented to validate the theoretical results. 

Key words: Helmholtz equation, Krylov subspace methods, preconditioning, shifted 
Laplacian, multigrid methods 




Chapter 1 

On the optimality of shifted Laplacian in a class 
of polynomial preconditioners for the Helmholtz 
equation 


Siegfried Cools, Wim Vanroose 


Summary. This paper introduces and explores a class of polynomial precondition¬ 
ers for the Helmholtz equation, denoted as expansion preconditioners EX(m ), that 
form a direct generalization to the classical complex shifted Laplace (CSL) pre¬ 
conditioner. The construction of the EX(m) preconditioner is based on a truncated 
Taylor series expansion of the original Helmholtz operator inverse. The expansion 
preconditioner is shown to significantly improve Krylov solver convergence rates 
for growing values of the number of series terms m. However, the addition of mul¬ 
tiple terms in the expansion also increases the computational cost of applying the 
preconditioner. A thorough cost-benefit analysis of the addition of extra terms in the 
EX (m) preconditioner proves that the CSL or EX( 1) preconditioner is the most ef¬ 
ficient member of the expansion preconditioner class for general practical problem 
and solver settings. Additionally, possible extensions to the expansion precondi¬ 
tioner class that further increase preconditioner efficiency are suggested, and nu¬ 
merical experiments in ID and 2D are presented to validate the theoretical results. 


1.1 Introduction 

1.1.1 Overview of recent developments 

The propagation of waves through any material is often mathematically modeled by 
the Helmholtz equation, which represents the time-independent waveforms in the 
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Fig. 1.1 Top: spectrum of the ID Helmholtz operator discretized using second order finite dif¬ 
ferences on a 48 x 48 equidistant grid with standard homogeneous Dirichlet boundary conditions. 
Bottom: spectrum of the corresponding Complex Shifted Laplacian (CSL) operator. 


frequency domain. For high wavenumbers, i.e. high spatial frequencies, the sparse 
linear system that results from the discretization of this PDE is distinctly indefinite, 
causing most of the classic direct and iterative solution methods to perform poorly. 
Over the past few years, many different Helmholtz solution methods have been pro¬ 
posed, an overview of which can be found in [18]. Krylov subspace methods like 
GMRES [31] or BiCGStab [40] are known for their robustness and are hence fre¬ 
quently used for the solution of Helmholtz problems [22, 3, 36, 27]. However, due 
to the indefinite nature of the problem, Krylov methods are generally not efficient 
as Helmholtz solvers without the inclusion of a suitable preconditioner. 

In this chapter we focus on the class of so-called shifted Laplace precondition¬ 
ers, which were introduced in [25] and [15], and further analyzed in [16, 17]. It was 
shown in the literature that contrary to the original discretized Helmholtz system, 
the complex shifted Laplace (CSL) system (or damped Helmholtz equation) can 
be solved efficiently using iterative methods [29, 18]. Originally introduced by Fe¬ 
dorenko in [19], multigrid methods [7, 8, 37, 9, 39, 13] have been proposed as scal¬ 
able solution methods for the shifted Laplace system in the literature [17]. Typically 
only one multigrid V-cycle on the CSL system yields a sufficiently good approxi¬ 
mate inverse, which can then be used as a preconditioner to the original Helmholtz 
system [16, 14, 28, 32]. The main concept behind the shifted Laplace preconditioner 
is deceivingly simple: by shifting the spectrum of the Helmholtz operator down into 
the complex plane, close-to-zero eigenvalues (leading to near-singularity) that pos¬ 
sibly destroy the iterative solver convergence are avoided, as illustrated in Figure 
1.1. Nevertheless, the results in this work will show that this apparent simplicity is 
exactly what makes the CSL preconditioner into a powerful tool for the iterative 
solution of the Helmholtz equation. 
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1.1.2 Outline of this chapter 

This study presents a generalization of the class of shifted Laplace preconditioners, 
which is based on a Taylor series expansion [26, 6, 2] of the original Helmholtz 
operator inverse around a complex shifted Laplacian operator. This formulation re¬ 
lates the original Helmholtz inverse to an infinite sum of shifted Laplace problems. 
By truncating the series we are able to define a class of so-called expansion pre¬ 
conditioners, denoted by EX (in), where the number of terms m in the expansion is 
a parameter of the method. The expansion preconditioner directly generalizes the 
classic complex shifted Laplace preconditioner, since the CSL preconditioner ap¬ 
pears as the operator EX (1), i.e. the first term in the Taylor expansion. 

Using a spectral analysis [16, 41, 12], the incorporation of additional series terms 
in the EX (, m ) preconditioning polynomial is shown to greatly improve the spectral 
properties of the preconditioned system. When used as a preconditioner the EX(m ) 
operator hence allows for a significant reduction of the number of outer Krylov steps 
required to solve the Helmholtz problem for growing values of in. However, the ad¬ 
dition of multiple terms in the expansion also increases the computational cost of 
applying the preconditioner, since each additional series term comes at the cost of 
one extra shifted Laplace operator inversion. The performance trade-off between the 
reduction of the number of outer Krylov iterations and the cost of additional terms 
(CSL inversions) in the preconditioner polynomial is analyzed in-depth. Further¬ 
more, several theoretical extensions to the expansion preconditioner are introduced 
to improve preconditioner efficiency. These extensions provide the reader with sup¬ 
plementary insights into Helmholtz preconditioning. The proposed methods show 
similarities to the research on flexible Krylov methods [30, 35] and more specifi¬ 
cally the work on multi-preconditioned GMRES [21]. 

A variety of numerical experiments are performed to validate the EX (in) precon¬ 
ditioner and illustrate the influence of the number of series terms m on convergence. 
Performance and computational cost of CSL- and EX (m )-preconditioned BiCGStab 
[40] are compared for one- and two-dimensional Helmholtz model problems with 
absorbing boundary conditions. These absorbing boundaries are implemented using 
Exterior Complex Scaling (ECS) [1, 34, 28], a technique which has been related to 
Perfectly Matched Layers (PMLs) [5] by Chew and Weedon [11]. 

The remainder of this chapter is organized as follows. In Section 1.2 we outline 
the theoretical framework for this work and we introduce the expansion precondi¬ 
tioner class EX(m). Following its formal definition, an overview of the theoretical, 
numerical and computational properties of the expansion preconditioner is given. 
Section 1.3 presents several possible extensions to the proposed EX(m) precondi¬ 
tioner, which are shown to improve preconditioner efficiency even further. The new 
preconditioner class is validated in Section 1.4, where it is applied to a ID and 2D 
Helmholtz benchmark problem. A spectral analysis confirms the asymptotic exact¬ 
ness of the expansion preconditioner as the number of terms m grows towards infin¬ 
ity. Additionally, experiments are performed to compare the efficiency and compu¬ 
tational cost of the EX (in ) preconditioner for various values of m. Conclusions and 
a short discussion on the results are formulated in Section 1.5. 
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1.2 The expansion preconditioner 

In this section we introduce the general framework for the construction of the expan¬ 
sion preconditioner. Starting from the notion of the classic complex shifted Laplace 
operator, we define the class of expansion preconditioners based on a Taylor expan¬ 
sion of the original Helmholtz operator inverse around a shifted Laplace problem 
with an arbitrary shift parameter. The definition of the expansion preconditioner is 
followed by an overview of the fundamental analytical and computational properties 
of the new preconditioner class. 


1.2.1 The complex shifted Laplacian preconditioner 

In this work we aim to construct an efficient solution method for the d-dimensional 
Helmholtz equation 


(—A — k 2 (x))«(x) =/(x), xGQcM. d , (1.1) 

with outgoing wave boundary conditions 

m = outgoing on dQ, (1.2) 

where —k(x) 2 is a distinctly negative shift. Here k £ E designates the wavenumber, 
which will be assumed to be a spatially independent constant throughout most of this 
work for simplicity. However, note that the definitions in this section do not depend 
on this assumption. The above equation is discretized using a finite difference, finite 
element or finite volume scheme of choice, yielding a system of linear equations of 
the general form 

Au = f, (1.3) 

where the matrix operator A represents a discretization of the Helmholtz operator 
A = (—A — k 1 ). It has been shown in the literature that iterative methods in gen¬ 
eral, and multigrid in particular, fail at efficiently solving the discretized Helmholtz 
system (1.3) due to the indefiniteness of the operator A [13, 18]. However, the ad¬ 
dition of a complex shift in the Helmholtz system induces a damping. This allows 

for a more efficient solution of the resulting system, which is known as the complex 

shifted Laplacian (CSL) 

(-A — (1 + fii)k 2 (x))u(x) =/(x), (1.4) 

where p £ R 1 is the complex shift (or damping) parameter that is conventionally 
chosen to be positive [15, 17]. The discretized shifted Laplace system is denoted by 


Mu = /, 


( 1 . 5 ) 
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where M is the discretization of the complex shifted Helmholtz operator M = (—A — 

(1+/3 i)k 2 ). 

It is well-known that this system can be solved using multigrid when the shift 
parameter ft is sufficiently large [15, 16, 14], Furthermore, it has been shown in the 
literature that the complex shift parameter p should be chosen at least as large as 
the critical value j3 m ; n . For a multigrid V-cycle with the standard linear interpolation 
and full weighting restriction operators and a traditional ©-Jacobi or Gauss-Seidel 
type smoothing scheme, the rule-of-thumb value for j8 m i n was shown to be 0.5 for a 
V(l,0)-cycle [16], and lies roughly around 0.6 when solving the CSL system using a 
V(l,l) multigrid cycle [12]. Note that the latter value for the shift will be commonly 
used throughout this chapter. 

The exact solution to (1.4) is a damped waveform that is fundamentally different 
from the solution to the original Helmholtz system (1.1). However, the inverse of 
the shifted matrix operator M can be used as a preconditioner to the original system. 
This preconditioning technique is known as the complex shifted Laplace precondi¬ 
tioner, and was shown to perform well as a preconditioner for Helmholtz problems, 
see [16, 14,41], 


1.2.2 The class of expansion preconditioners 

1.2.2.1 General criteria for preconditioner efficiency 

The aim of this work is to extend the existing class of shifted Laplace precondition¬ 
ers to obtain a more efficient preconditioner to the original Helmholtz system (1.3). 
Moreover, we would like to construct a preconditioner M such that M « A or, as an 
equivalent measure, we require that the eigenvalues of the preconditioned operator 
M~ l A are concentrated around one, i.e. 

spec(M~*A) ss 1. (1.6) 

In the context of an efficient iterative solution, the condition number K = k(M 1 /\) 
of the preconditioned system is often related to the number of Krylov iterations [23], 
Although this relation is somewhat heuristic in the context of non-normal matrices 
[38], we believe that it provides an insightful intuition on preconditioner efficiency. 
The above requirement ( 1 .6) can hence be broadened by requiring that the condition 
number K approximately equals one, i.e. 

k(M~ 1 A) ss 1. (1.7) 

We trivially note that by the above characterizations the best preconditioner to the 
Helmholtz system (1.3) is (a good approximation to) the original operator A itself. 
However, the discrete operator A is generally close to singular and hence cannot 
be easily inverted in practice. On the other hand, given a sufficiently large complex 
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shift, the CSL system (1.4) can be approximately inverted using e.g. a (series of) 
multigrid V-cycle(s). Note that the complex shifted Laplacian is generally not a very 
precise approximation to the original Helmholtz operator unless the shift parameter 
/j is very small, which in practice is not achievable due to stability requirements on 
the preconditioner inversion, see [12], 

Following the general idea of preconditioning, the optimal Helmholtz precondi¬ 
tioner M 0pt thus ideally satisfies the following two key properties, which are inspired 
by analogous conditions that were formulated in [20]: 

(PI) M^} is a good approximation to the exact inverse of the original Helmholtz 
operator A -1 , such that condition (1.6) is satisfied, i.e. the spectrum of the 
preconditioned operator M^A is clustered around one, 

(P2) for any given vector v, Mop}v' can be efficiently computed iteratively. This 
implies a good approximation to M op [ v is found after a ‘moderate’ number 
of iterations of the chosen method, with a manageable cost per iteration. 

In the context of this chapter condition (P2) is satisfied if M op [ can be formulated 
in terms of shifted Laplace operator inverses with a shift parameter fi that is suf¬ 
ficiently large to ensure a stable iterative solution of the shifted Laplace inverses. 
Indeed, given that the shift parameter /3 is sufficiently large, a good approximation 
to the CSL inverse can be computed using e.g. only one multigrid V-cycle, see [16]. 

Note that due to the strong indefiniteness of the Helmholtz operator conditions 
(PI) and (P2) are generally incompatible. The classic CSL preconditioner trivially 
satisfies the second condition given that the shift parameter fi is large enough, how¬ 
ever, the first condition is typically violated when fi is large. In the following we 
aim at constructing a preconditioning scheme based upon the shifted Laplace pre¬ 
conditioner, which effectively satisfies both of the above conditions. 


1.2.2.2 Taylor series expansion of the inverse Helmholtz operator 

The complex shifted Laplace preconditioning operator M can be written more gen¬ 
erally as 

M(I5) = —A —k 2 (x) +P(/3,x), (1.8) 

where P( j3,x) is a possibly spatially dependent linear operator in the shift param¬ 
eter fi £ R + , satisfying P(0. x) = 0, such that M(0) = A. The above formulation 
(1.8) characterizes, apart from the complex shifted Laplace (CSL) operator, also the 
concept of complex stretched grid (CSG), where the underlying grid is rotated into 
the complex plane. This results in a damped Helmholtz problem that is equivalent 
to complex shifted Laplacian, see [29], For the remainder of this text we however 
assume P(j 3,x) = — j3 ik 2 (x) as suggested by (1.4). 

We define an operator functional / based on the general shifted Laplacian oper¬ 
ator M as follows: 
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/( P) :=M(l 3)- 1 = (-4 - (1 +/ 3 /)* 2 )" 1 . ( 1 . 9 ) 

Choosing = 0 in the above expression results in the inverse of the original 
Helmholtz operator A = M( 0), whereas choosing /) > 0 yields the inverse of the 
shifted Laplace operator M( j3). The derivatives of the functional / are given by 

/W(/3) = n! ( k 2 i) n {-A - (1 +pi)k 2 )-( n+1 \ (1.10) 

for any n £ N. Constructing a Taylor series expansion [26, 6, 2] of /(j3) around a 
fixed shift j3o £ R ' leads now to the following expression 

(Til) 

to n[ 

where the derivatives /-”*(A>) are defined by (1.10). Note that the derivatives of / 
in (1.11) are negative powers of the complex shifted Laplace operator M(j3o). By 
evaluating the functional /(/3) i n /j = 0 and by choosing a sufficiently large positive 
value for j3o, equation (1.11) yields an approximation of the original Helmholtz 
operator inverse in terms of CSL operator inverses, i.e. 

to n - 

= £ (-pok 2 i) n (-A - (1 +Mk 2 r (n+1) 

n =0 

oo 

= E (-Pok 2 i) n M Q3 0 )- ( " +1) . (1.12) 

H=0 

Hence, the computation of the infinite series of easy-to-compute inverse CSL oper¬ 
ators j3o) with an arbitrary shift parameter j3o asymptotically results in the exact 
inversion of the original Helmholtz operator M(0) = A. 


1.2.2.3 Definition of the expansion preconditioner 

By truncating the expansion in (1.12), we can now define a new class of polynomial 
Helmholtz preconditioners. For a given m, each particular member of this precondi¬ 
tioner class is denoted as the expansion preconditioner of degree m. 


m— 1 

EX(m) := £ a„ (-4 - (1 + 0oO* 2 r ( " +1) , 


n =0 


where the coefficients do ,..., a m _i are defined as 


a,, = (-Pok 2 i) n , 


(1.13) 


(0 < n < m — 1 ), 


(1.14) 
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by the Taylor series expansion (l.lO)-(l.ll). The expansion preconditioner EX (in) 
is hence a degree m polynomial in the inverse complex shifted Laplace operator 
M(j5 o)^ 1 = (—A — (1 + j3o i)k 2 )~ l . The above Taylor series approach appears quite 
natural. However, other series approximations to the Helmholtz operator inverse 
may be constructed using alternative choices for the series coefficients. We refer to 
Section 1.3 for a more elaborate discussion on the choice of the series coefficients. 


1.2.2.4 Properties of the expansion preconditioner 

Following the formal definition ( 1.1 3), we formulate some essential properties of the 
EX (m) class of preconditioners in this section. Firstly, one trivially observes that the 
classic CSL preconditioner is a member of the class of expansion preconditioners. 
Indeed, the complex shifted Laplace inverse M( is the first order term in the 
Taylor expansion (1.12), and hence we have M(Pq)^ 1 = EX(\). 

By including additional terms in the preconditioning polynomial (i.e. for m —» °°), 
the EX ( m ) preconditioner becomes an increasingly accurate approximation to the 
original Helmholtz operator inverse A~ 1 . Hence, the class of EX (in) preconditioners 
is asymptotically exact, since 

lim EX(m)= V a n (-A-(l+p Q i)k 2 r {n+1) =M(0)“'. (1.15) 

m —>°o 

n =0 

This implies that, if we assume that the computational cost of computing the in¬ 
verse matrix powers in (1.15) is manageable, EX(m) satisfies both conditions (PI) 
and (P2) for efficient Helmholtz preconditioning suggested in Section 1.2.2.1. It 
should be stressed that (PI) in fact holds asymptotically, and is thus in practice only 
satisfied when a large number of series terms is taken into account. The EX(m ) 
preconditioner is thus expected to be increasingly more efficient for growing m, 
which suggests a significant reduction in the number of outer Krylov iterations. On 
the other hand, condition (P2) is satisfied when m is not too large. This creates a 
trade-off for the value of m, which is commented on in Section 1.2.2.5. 

While the approximation precision of the EX(m) preconditioner clearly benefits 
from the addition of multiple terms in the expansion, note that the accuracy of the m- 
term EX(m ) approximation is governed by the truncation error of the series (1.11). 
This truncation error is of order for any EX (in) preconditioner (with m > 0), 

i.e. 

M(0) -1 = EX(m) + ^(J3 0 m ). (1.16) 

The efficiency of the EX (in) preconditioner is hence also intrinsically dependent 
on the value of the shift parameter /3o- However, it is well-known that j3o cannot be 
chosen below a critical value for iterative (multigrid) solver stability, which typically 
lies around 0.5 or 0.6, see [16, 12], Consequently, it is clear from (1.16) that con¬ 
vergence of the Taylor series (1.11) is slow, being in the order of [it "'. This indicates 
that a large number of terms has to be taken into account in the EX (m) polynomial 
to obtain a high-precision approximation to the original Helmholtz operator inverse. 
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1.2.2.5 Computational cost of the expansion preconditioner 

The inclusion of additional series terms yields a higher-order EX ( m ) preconditioner 
polynomial, which is expected to improve preconditioning efficiency as derived 
above. Therefore, if the computational cost of the CSL inversions would be neg¬ 
ligible compared to the cost of applying one Krylov iteration, there would theoret¬ 
ically be no restriction on the number of terms that should be included in EX(m). 
Unfortunately, even when approximating each CSL inversion by one V-cycle, the 
computational cost of the CSL inversions is the main bottleneck for the global cost 
of the solver in practice. Indeed, while the addition of multiple series terms improves 
performance, it also increases the computational cost of applying the preconditioner. 
In this section we briefly expound on the computational cost of the EX(m) precon¬ 
ditioner using a simple theoretical cost model. 

We model the computational cost of the EX (m) preconditioner by assuming that 
its cost is directly proportional to the number of CSL operator inversions that need 
to be performed when solving the preconditioning system. Each additional term in 
the series (1.11) requires exactly one extra shifted Laplace system to be inverted, 
since the EX(m) polynomial can be constructed as follows: 

EX(l)w = aoM(P 0 y l w, 
v o 

EX (2 )w = aovo + M(/3o)~ 1 vq , 

:=vi 


EX(m)w 


m—2 

OL n Vn H - CX>m— 1 V m —2 ? 

77=0 V V / 


(1.17) 


where w £ R' v is a given vector of size N, i.e. the number of unknowns. Note that 
all complex shifted Laplace systems in (1.17) feature the same shift parameter /3o- 
Hence, an additional CSL system of the form 


M(/3o)V; = v,_i, with v ~1 := w. (0<i<m— 1), (1-18) 

has to be solved for each term in the expansion preconditioner, resulting in a total 
of m inversions to be performed. We again stress that in practice the shifted Laplace 
inverse M(j3o) _1 is never calculated explicitly, but the approximate solution to (1.18) 
is rather computed iteratively by a multigrid V-cycle. 

The question rises whether the reduction in outer Krylov iterations when us¬ 
ing the multi-term EX(m) preconditioning polynomial compensates for the rising 
cost of the additional (approximate) CSL inversions. Let the computational cost 
of one approximate CSL inversion be denoted as one work unit (1 WU), and let 
the total computational cost of the EX(m) preconditioner in a complete EX(m)- 
preconditioned Krylov solve be denoted by % ot . If the number of Krylov iterations 
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until convergence (up to a fixed tolerance tol) is p(m), then % ot = m ■ p(in) WU. 
Hence, it should hold that p(m) < C/m for some moderate constant C for the cost 
of the EX(m ) preconditioner to support the inclusion of multiple series terms. Nu¬ 
merical experiments in Section 1.4 of this work will show that this is generally not 
the case for the Taylor expansion polynomial in many practical applications, and 
the classic CSL preconditioner is hence the optimal choice for a preconditioner in 
the EX(m) class. In the next section we propose several extensions to the EX(m) 
preconditioner class to further improve its performance. 


1.3 Extensions and further analysis 

In this section we propose two theoretical extensions to the Taylor series represen¬ 
tation (1.11) for the inverse Helmholtz operator. These extensions provide essential 
insights into the expansion preconditioners and aim at further improving precondi¬ 
tioner efficiency. The primary goal is to improve the performance of the expansion 
preconditioner class, resulting in a more cost-efficient preconditioner with respect to 
the number of terms m. The theoretical results obtained in this section are supported 
by various numerical experiments in Section 1.4 that substantiate the analysis and 
illustrate the efficiency of the extended expansion preconditioner. 


1.3.1 The expansion preconditioner as a stationary iterate 

We first consider an extension of the EX (m) preconditioner class that allows manual 
optimization of the series coefficients for each degree m. To this aim, we illustrate 
how the EX(m) preconditioner can be interpreted as the m-th iterate of a specific 
fixed-point iteration. Consequently, a class of extended expansion preconditioners 
is defined by optimizing the fixed-point iteration. 


1.3.1.1 Taylor series-based polynomial preconditioners as fixed-point iterates 

Recall that the foundation for the Taylor-based EX (in) preconditioner class pre¬ 
sented in Section 1.2 is the reformulation of the original Helmholtz operator inverse 
as a Taylor series. Equation (1.12) can alternatively be reformulated as 

m =M(0 )- 1 =M(A,)- 1 £ (-p 0 k 2 i) n M(l3o)- n 

n—0 

= M(/3o)“ I (/ + /3ok 2 /M(/3orT 1 , (1.19) 

where the last equation follows from the limit expression 
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y x n = - for be I < 1. (1-20) 

to l ~ x 

For general (matrix) operators, this series is known in the literature as a Neumann 
series [42]. Note that for the matrix equation (1.19) the requirement |x < 1 is met if 
p(—j3o^ 2 /M(j3o) _1 ) < 1, which is trivially satisfied since |A ; (M(j3o))| > | — fiok 2 i\ 
for all j ,N, where we assume M{ j3o) £ C NxN . For notational convenience, 

let us denote 

L-=-Pok 2 iM{l5 o)" 1 . (1.21) 

so that the last line in (1.19) reads 

M{0)- x =M(fo)-\l-L)- 1 . (1.22) 

Equation (1.22) shows that inverting the indefinite Helmholtz operator M(0) is 
equivalent to subsequently inverting the operator M (/3o ) followed by the inversion 
of the operator (I — L). The first operator is simply the inverse of a CSL operator 
and can easily be solved iteratively. However, the second inversion is non-trivial, as 
it requires the solution u to the system 


(. I-L)u = b , (1.23) 

given a right-hand side b £ The linear system (1.23) can alternatively be formu¬ 
lated as a fixed-point iteration (or stationary iterative method) 

u {m+ ') =Lu (m] +b, m> 0. (1.24) 

One observes that by setting u' >>} = 0, the m-th iterate of this fixed-point method 
generates the EX (in ) preconditioner, since 


U {m) ={y^b^(J-L)-'E 


which implies 


M(0) -1 b « M(/3 0 ) _1 r/ m) = 



m— 1 \ 

L L " * 


n=0 


EX(m) b. 


(1.25) 


(1.26) 


The fixed-point iteration (1.24) thus asymptotically generates the Taylor series ex¬ 
pansion (1.11). 

The truncation analysis in Section 1.2.2.4 indicated that the Taylor series displays 
a slow convergence. Alternatively, convergence behavior can now be analyzed by 
studying the convergence of the fixed-point iteration (1.24), which is governed by 
the spectral radius of the iteration matrix 

p{L)=p (-P 0 ik 2 {-A - (1 +/ 3 q i)k 2 )~ x ). 


( 1 . 27 ) 
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For j3o > 0 this spectral radius tends to be relatively close to one, since 


P(L) 


max 

1 <j<N 


-Poik 2 
(1 + p 0 i)k 2 


( min 

\l<j<N 


Xj — k 2 

P oik 2 


1 , 


(1.28) 


where A j (1 < j < N) are the eigenvalues of the negative Laplacian. Hence, the 
slow convergence of the Taylor series is apparent from the spectral properties of the 
fixed-point iteration. 


1.3.1.2 Weighted fixed-point iteration to improve convergence 

To obtain a convergence speed-up, the fixed-point iteration ( 1 .24) can be substituted 
by a more general weighted stationary iteration 

u (m+V) = (1 - o})u {m > + CoLu [m) + cob, co £ [0,2], m > 0, (1.29) 

for b £ $L n , which can alternatively be written as 

M (m+I ) = Lu (m) +cob, ©€[0,2], m> 0, (1.30) 

using the notation L := (1 — <o)I + coL. Setting u {)) = 0 as the initial guess, this 
iteration constructs a different class of polynomial expansion preconditioners for 
any choice of co £ [0,2], as follows; 

EX a (m)b:=M(Po)- l u^ m \ m> 0, (1.31) 

where iP m ' 1 is given by (1.30). We call this class of preconditioners the extended 
expansion preconditioner of degree m, and denote them by EX m (m) to indicate their 
dependency on co. 

The parameter co allows us to modify the coefficients of the series expansion 
to obtain a more suitable truncated series approximation to the original Helmholtz 
inverse. Note that the Taylor expansion preconditioner EX(m) can be constructed 
from iteration (1.30) by setting the parameter (0 = 1. Additionally, note that choos¬ 
ing co = 0 trivially yields the CSL preconditioner M(Pq)~ 1 = EXo(m) for all m > 0. 

The careful choice of the parameter co £ [0,2] in (1.31) possibly results in a 
series that converges faster than the Taylor series generated by (1.24). Indeed, the 
parameter co can be chosen to modify the polynomial coefficients such that 

p(L)<p(L), (1.32) 

yielding a series that converges faster than the Taylor series (1.11). Consequently, 
for the right choice of co, the EX a (m) truncated series results in a more efficient 
preconditioner than the original EX(m ) polynomial of the same degree. We refer to 
the numerical results in Section 1.4 to support this claim. 
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1.3.2 GMRES-based construction of the expansion polynomial 

The optimization of the series coefficients through the choice of the parameter ft) 
in the EX (a (///) operator can be generalized even further by letting the coefficients 
of the series expansion vary freely. Moreover, the coefficients can be optimized 
depending on the degree m of the preconditioning polynomial. By replacing the 
stationary fixed-point iterations in ( 1. 24)-( 1. 30) by a more advanced Krylov solution 
method, an optimal degree m polynomial approximation to the original Helmholtz 
operator can be constructed. 


1.3.2.1 Optimization of the expansion preconditioner 

For the fixed-point method ( 1 .24) we essentially constructed the Taylor polynomial 
EX(m) using a fixed linear combination of the following basis polynomials 

®{m) = {L,L(I + L), L (I + L + L 2 ),...}. (1.33) 

Alternatively, the extended expansion preconditioner EX a (m) was formed as a fixed 
linear combination of the basis polynomials 

^ (m) = { coL, (2® - ft) 2 )L + (0 2 L 2 ,...}, (1.34) 

for the weighted fixed-point iteration (1.30). Note that in this section L designates 
the inverse of the CSL operator as before, up to scaling by a scalar constant, i.e. 

L :=M(/3 0 ) -1 - (1.35) 

As a direct generalization of the above constructions, we now consider the co¬ 
efficients in each step of the iterative procedure to be variable. This boils down to 
constructing the preconditioning polynomial from the monomial basis 


3?(in) = span|L,L 2 ,L 3 , . (1.36) 

Since the preconditioning polynomial asymptotically results in the exact Helmholtz 
operator inverse, we can alternatively solve the preconditioning system 

Lv = g , with v=Au and g = Lf , (1-37) 

using m steps of GMRES [31], which results in construction of an //(-term polyno¬ 
mial from the Krylov basis 


JfmiL, A r 0 ) = span {A r 0 ,L A r 0 ,L 2 A r 0 , ...,L' n 'Aro}. (1.38) 

After an additional multiplication with the operator L, the m -th Krylov subspace ex¬ 
actly generates a preconditioning polynomial from the basis :7(m) (1.36). Hence, 
a generalized expansion preconditioner can be constructed by applying m steps of 
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GMRES on the system Lv = g, which allows for a free choice of the polynomial 
coefficients. Moreover, since GMRES minimizes the residual over the m -th Krylov 
subspace, the resulting m -term preconditioner is the optimal polynomial approxi¬ 
mation of degree m to the exact Helmholtz inverse. These concepts resemble the 
principles of polynomial smoothing by a GMRES(m)-based construction, see [10]. 


1.3.2.2 Simultaneous construction of preconditioner and Krylov solver basis 

Further extending the above methodology, we outline the theoretical framework for 
an integrated construction of the preconditioner polynomial in the Krylov subspace 
construction at the solver runtime level. The key notions in this section show some 
similarities to the work on multi-preconditioned GMRES in [21], We additionally 
refer to the closely related literature on flexible Krylov solvers [30, 35]. 

Consider the Krylov method solution to the EX (m)-preconditioned Helmholtz 
system 

EX (m) Au = EX (in) /, (1.39) 

for a fixed polynomial degree m. Using GMRES to solve this system, the k-th resid¬ 
ual rt = EX(m)(f — Au> k ’) is minimized over the Krylov subspace 

Jtfj c (EX(m)A,ro) = span jro, EX(m)Aro , ..., (. EX(m)A) k ~ l ro|. (1-40) 

Note that this basis spans an entirely different subspace compared to the Krylov 
subspace (1.38). Indeed, for the basis terms in (1.40) the preconditioning polynomial 
degree is fixed at m while the power of the Helmholtz operator/1 is variable. To form 
the generalized EX(m) polynomial in (1.38) on the other hand, the powers of the 
CSL inverse L vary but the power of A is fixed at one. 

A combination of the two principles characterized by (1.38) and (1.40) can be 
made by embedding the iterative procedure for the construction of the expansion 
preconditioner polynomial EX (m) in the governing Krylov solver. This results in a 
mixed basis consisting of a structured mixture of powers of the inverse CSL operator 
L and the original system matrix A applied to the initial residual ro. We denote the 
mixed basis corresponding to the EX (m ) preconditioner by 

J^ £Z( ” , )(A,ro) = span {C-M-Go : 1 < i < m, 0 </<&}, (1.41) 

for any m > 1. One trivially observes from the definition (1.41) that 

Gf X( "«-l)(A,r 0 ) C J(f k EX(m) (A,r 0 ), (1.42) 

which generalizes the embedding of the EX (1) or CSL preconditioner in the class 
of EX(m) expansion operators to the mixed basis setting. The subspace spanned by 
(1.41) theoretically allows for the simultaneous construction of the preconditioning 
polynomial and the solution of the preconditioned system. However, the mixed basis 
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(A, r o) generally does not span a Krylov subspace for any m > 1, making 
its practical construction non-trivial. 

As mentioned earlier, the addition of extra terms in the EX (m ) polynomial im¬ 
proves the polynomial approximation to the exact Helmholtz operator inverse, re¬ 
sulting in faster convergence in terms of outer Krylov iterations. This implies lower 
powers of the Helmholtz operator A in the mixed basis (1.41). However, the addition 
of extra terms in the polynomial EX(m) also increases the number of vectors con¬ 
stituting the mixed basis, and hence gives rise to a higher computational cost of the 
total method. This trade-off between preconditioner approximation precision and 
computational cost in function of the number of terms m is apparent from (1.41). 

As a final remark, note that the extensions proposed in this section are mainly 
intended as an insightful theoretical framework. In practice the GMRES-based ex¬ 
tended preconditioner proposed in Section 1.3.2.1 is unlikely to perform signifi¬ 
cantly better than the extended EX m (m) preconditioner proposed in the previous 
section. This is a consequence of the fact that, given a sufficiently large shift param¬ 
eter /3, the convergence speed of any monomial-based series of this type is slow, as 
was already pointed out in Section 1.2.2.4. 


1.4 Numerical results 

In this section we present experimental results that illustrate the practical applica¬ 
tion of the class of expansion preconditioners to enhance the Krylov convergence 
on a ID and 2D Helmholtz benchmark problem. The primary aim is to validate the 
EX (in) expansion preconditioner for degrees m > 1 and illustrate the asymptotic 
behavior of the EX(m) preconditioner as m —> °o. Initial numerical experiments in 
Sections 1.4.1-1.4.3 will use exact inverses of the complex shifted Laplace oper¬ 
ators appearing in the polynomial preconditioner. We then introduce a multigrid 
V(l,l)-cycle as an approximate solver for the CSL systems in the expansion. The 
performance of the EX(m) preconditioner is consequently compared to that of the 
classic complex shifted Laplace or EX (\) preconditioner. Additionally, the exten¬ 
sions to the class of generalized EX m (m) preconditioners and the combination of 
the preconditioner polynomial and Krylov basis construction (see Section 1.3) are 
shown to display the potential to improve the preconditioner’s efficiency. 


1.4.1 Problem setting: a ID constant wavenumber Helmholtz 
problem with absorbing boundary conditions 

Consider the one-dimensional constant wavenumber Helmholtz model problem on 
the unit domain 


(—A—k 2 )u(x)=f(x), xGQ = [ 0 , 1 ], 


( 1 . 43 ) 
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Fig. 1.2 Spectral analysis of the discretized ID Helmholtz model problem (1.43). Exact precon¬ 
ditioner inversion. Left: spectrum of the Helmholtz operator A with ECS absorbing boundary con¬ 
ditions. Right: spectrum of the polynomial preconditioned operator EX(m)A for various values of 
m. The spectrum becomes more clustered around 1 for increasing values of m. 


where the right-hand side f(x) represents a unit source in the domain center. The 
wavenumber is chosen to be k 2 = 2 x 10 4 . The equation is discretized using a 
standard Shortley-Weller finite difference discretization [33], required to treat the 
absorbing boundary layers (see below). The unit domain Q is represented by an 
iV+ 1 = 257 equidistant point grid, defined as Q 1 ’ = {xj = jh,0 < j < N}, respect¬ 
ing the physical wavenumber criterion kh = 0.5524 < 0.625 for a minimum of 10 
grid points per wavelength [4]. 

To simulate outgoing waves near the edges of the numerical domains, we use 
exterior complex scaling [34, 11], or ECS for short, adding absorbing layers to both 
sides of the numerical domain. The absorbing layers are implemented by the addi¬ 
tion of two artificial complex-valued extensions to the left and right of the domain 
£2, defined by the complex grid points {z.j = exp(;0£cs) ■*,/}’ where xj = jh , for 
■N/4 < j < 0 and N < j < 5 /4N. The ECS complex scaling angle that determines 
the inclination of the extensions in the complex plane is chosen as 9ecs = 7T/6. 
The two complex-valued extensions feature N/4 grid points each, implying the dis¬ 
cretized Helmholtz equation takes the form of an extended linear system 


An = f, (1.44) 

where A £ C? Nx ? N . The discretized right-hand side / = (/)) £ C? N is defined as 


fj = /(*;) 


1 for j = N/2, 
0 elsewhere. 


(1.45) 


representing a unit source located in the center of the domain for this example. 
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Fig. 1.3 Conditioning and EX (m)-BiCGStab performance on the discretized ID Helmholtz model 
problem (1.43). Exact preconditioner inversion. Left: condition number of the preconditioned op¬ 
erator EX ( m)A as a function of m. Right: number of EX (m)-BiCGStab iterations required to solve 
the Helmholtz system (1.44) as a function of m. 


The Helmholtz model problem (1.43) is solved using EX (m) -preconditioned 
BiCGStab [40] up to a relative residual tolerance ||rp||/||ro|| < tol = le—8. The 
CSL operator inverses in the EX (in) polynomial are either computed exactly using 
LU factorization for the purpose of analysis, or approximated using one multigrid V- 
cycle as is common in realistic applications. Note that the complex shift parameter /3 
in the CSL operators is chosen as /3 = 0.6, which guarantees multigrid V(l,l)-cycle 
stability [12], 


1.4.2 Spectral analysis of the expansion preconditioner 

To analyze the efficiency of the EX (in) Helmholtz preconditioner we perform a 
classic eigenvalue analysis of the EX (in )-preconditioned Helmholtz operator. For 
convenience of analysis, the CSL inversions in the EX (in) preconditioning polyno¬ 
mial are solved using a direct method in this section. 

The typical pitchfork shaped spectrum of the indefinite Helmholtz operator with 
ECS boundary conditions is shown in the left panel of Figure 1.2. The leftmost 
eigenvalue is located near —k 2 = —2 x 10 4 , while the rightmost eigenvalue is close 
to4//z 2 — k 2 k2.4x 10 s , cf. [28]. The right panel of Figure 1.2 shows the spectrum 
of the EX(m )-preconditioned Helmholtz operator for various numbers of terms in 
the Taylor polynomial EX (in). Note how the spectra become more clustered around 
1 when additional series terms are taken into account, illustrating the asymptotic 
exactness of the EX(m ) preconditioner class. 

The condition number of the preconditioned operator k(EX(iii)A) is displayed 
in the left panel of Figure 1.3 as a function of in. One observes that conditioning 
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m 

P{m) 

p(m) 

P( 1 ) 

m ■ p(m) 

i 

34 

1.00 

34 WU 

2 

22 

0.65 

44 WU 

3 

16 

0.47 

48 WU 

4 

13 

0.38 

52 WU 

5 

11 

0.32 

55 WU 


Table 1.1 Performance of £X(m)-BiCGStab for different values of m on the discretized ID 
Helmholtz model problem ( 1 .43). Exact preconditioner inversion. Column 1: number of BiCGStab 
iterations p(m ) required to solve the system (1.44). Col. 2: iteration ratio compared to classic CSL. 
Col. 3: preconditioner computational cost based on p(m). 


improves significantly by the addition of extra terms in the polynomial EX (in). This 
observation is reflected in the number of Krylov iterations required to solve the 
problem, which is displayed in Figure 1.3 (right panel) for a range of values of m. 
The number of Krylov iterations (right panel) appears to be directly proportional to 
the condition number of the preconditioned system (left panel). 


1.4.3 Performance analysis of the expansion preconditioner 

It is clear from the spectral analysis that the addition of multiple series terms im¬ 
proves the conditioning of the preconditioned system. In this section, the perfor¬ 
mance of the EX(m ) preconditioner is analyzed using the simple theoretical cost 
model introduced in Section 1.2.2.5. 

The experimentally measured performance of the EX (w)-BiCGStab solver on 
the model problem (1.43) is displayed in Table 1.1. The table shows the number 
of Krylov iterations required to solve system (1.44) until convergence up to tol 
= le—8 (first column), the iteration ratio compared to the standard CSL precondi¬ 
tioner (second column), and the effective number of work units (CSL inversions) 
for the entire run of the method (third column). The EX(m ) preconditioner becomes 
increasingly more efficient in reducing the number of Krylov iterations in function 
of larger m. Comparing e.g. the EX (3) preconditioner to the classic EX ( 1) (CSL) 
scheme, one observes that the number of Krylov iterations is slightly more than 
halved. The largest improvement is obtained by adding the first few terms, which is 
a consequence of the slow Taylor convergence. Note that the computational cost of 
the Krylov solver itself is not incorporated into this cost model. 

Although higher-order series approximations clearly result in a qualitatively bet¬ 
ter preconditioner, the number of Krylov iterations is not reduced sufficiently to 
compensate for the cost of the extra inversions. Indeed, while the addition of mul¬ 
tiple series terms in the EX (in) preconditioner improves the spectral properties of 
the preconditioned system, the increased computational cost of the extra CSL inver¬ 
sions appears to be a bottleneck for performance. Hence, one observes that standard 
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m 

p(m) 

p(m) 

Pi 1) 

CPU time 

i 

49 

1.00 

0.57 s. 

2 

39 

0.80 

0.68 s. 

3 

34 

0.69 

0.80 s. 

4 

31 

0.63 

0.88 s. 

5 

30 

0.61 

1.02 s. 


Table 1.2 Performance of £X(m)-BiCGStab for different values of m on the discretized ID 
Helmholtz model problem (1.43). V(l,l)-cycle approximate preconditioner inversion. Column 1: 
number of BiCGStab iterations p(m) required to solve the system (1.44). Col. 2: iteration ratio 
compared to CSL. Col. 3: CPU time until convergence (in seconds). 


CSL preconditioning - which takes only one series term into account - is the most 
cost-efficient, requiring a minimum of 34 WU for the entire solve. 


1.4.4 Multigrid inversion of the expansion preconditioner 

For convenience of analysis a direct inversion of the preconditioning scheme was 
used in the previous sections. However, in realistic large-scale applications the terms 
of the EX(m) preconditioner often cannot be computed directly. Instead, the CSL 
systems comprising the EX(m) polynomial are approximately solved using some 
iterative method. In this section we use one geometric multigrid V(l,l)-cycle to ap¬ 
proximately solve the CSL systems, which is a standard approach in the Helmholtz 
literature [16, 14]. The V(l,l)-cycle features the traditional linear interpolation and 
full weighting restriction as intergrid operators, and applies one weighted lacobi it¬ 
eration (with parameter 2/3) as a pre- and post-smoother. The choice of the damping 
parameter = 0.6 guarantees stability of the multigrid solver for the inversion of 
the CSL operators, see [12], 

Table 1.2 summarizes the number of Krylov iterations and CPU time 1 required 
to solve system (1.44) using EX (m)-BiCGStab for different values of m. The appli¬ 
cation of the EX(m) operator is approximately computed using a total of m V(l,l)- 
cycles, see Section 1.2.2.5. The corresponding convergence histories are shown in 
Figure 1.4. The addition of terms in the EX (in) preconditioner reduces the num¬ 
ber of Krylov iterations as expected, although the improvement is less pronounced 
compared to the results in Table 1.1 due to non-exact inversion of the CSL opera¬ 
tors. However, the increased cost to (approximately) compute the additional series 
terms for larger values of m is clearly reflected in the timings. Hence, in terms of 
preconditioner computational cost, the classic CSL or EX(l) preconditioner is the 
most cost-efficient member of the EX ( m ) preconditioner class for this benchmark 
problem. 

1 Hardware specifications: Intel Core i7-2720QM 2.20GHz CPU, 6MB Cache, 8GB RAM. Soft¬ 
ware specifications: Windows 7 64-bit OS, experiments implemented in MATLAB R20I5a. 
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Fig. 1.4 Convergence of £X(m)-BiCGStab and solution of the discretized ID Helmholtz model 
problem (1.43). V(l,l)-cycle approximate preconditioner inversion. Left: £'X(m)-BiCGStab rela¬ 
tive residual history ||rp||/||ro|| for various values of m. Vertical axis in log scale. Right: numerical 
solution u(x) to the equation (1.43) up to the relative residual tolerance tol = le—8. The ECS 
absorbing boundary layer rapidly damps the solution outside the unit domain Q = [0,1], 


1.4.5 Validation of the extended expansion preconditioner 

In this section we validate the generalizations to the expansion preconditioner pro¬ 
posed in Section 1.3 on the ID Helmholtz model problem (1.43). 

The weighted fixed-point iteration (1.30) generates the generalized class of ex¬ 
pansion preconditioners EX m (m). Figure 1.5 shows the spectrum (left panel) and 
condition number (right panel) of the Helmholtz operator preconditioned by the 
two-term EX m (2) polynomial for different values of the parameter ft). Note that the 
condition number of the standard EX(2) preconditioner (ft) = 1) is k(EX(2)A) = 
17.29. A small improvement in conditioning is achievable through the right choice 
of the parameter ft), reducing the condition number to K(EX m (2)A ) = 15.13 for pa¬ 
rameter choices around ft) = 2. With the optimal choice for ft), the condition number 
of the classic CSL preconditioner EX o(2) is halved when using EX a { 2), suggesting 
a halving of the number of Krylov iterations may be achievable by using EX a { 2) 
instead of the classic CSL preconditioner. The smaller condition number implies an 
increase in performance compared to the EX (2) preconditioner, making the addition 
of extra terms in EX m (m ) theoretically cost-efficient. 

A first step towards a simultaneous construction of the EX ( m ) preconditioning 
polynomial and the outer Krylov basis resulting in the mixed basis (1.41) is illus¬ 
trated in Figure 1.6. The black curve shows the experimentally determined maxi¬ 
mum power of A (Krylov iterations) versus the maximum power of L (terms in the 
preconditioning polynomial) required to solve the Helmholtz benchmark problem 
(1.43) up to a relative residual tolerance tol = le—8. Subject to this tolerance, 
a solution is found either after 40 EX (I )-BiCGStab iterations or alternatively af¬ 
ter one EX (70)-BiCGStab iteration. Indeed, the incorporation of 70 terms in the 
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Real Parameter u 

Fig. 1.5 Spectral analysis of the discretized ID Helmholtz model problem (1.43). Exact precondi¬ 
tioner inversion. Left: spectrum of the preconditioned operator EX m (2)A for different values of the 
parameter ft). The spectrum for EX{m)A with to = 1 is indicated in black, see also Fig. 1.2. Right: 
condition number of the preconditioned operator EX m (2)A as a function of the weight ft). 



Power of L 


Fig. 1.6 Performance of the EX ( m ) preconditioner formed by the construction of the mixed basis 
(1.41) on the discretized ID Helmholtz model problem (1.43). Maximum power of L, i.e. the 
polynomial preconditioner degree m, vs. maximum power of A, i.e. the number of Krylov iterations 
p(m). Red curve: theoretical upper bound required for cost-efficiency. Black curve: experimentally 
measured results. 


EX {in) expansion preconditioner effectively reduces the number of outer Krylov 
iterations to 1. However, note that to be cost-efficient with respect to the number 
of CSL inversions, the same solution should be found using the EX (40) (degree 
40) polynomial preconditioner. The red curve represents a constant number of CSL 
inversions for the total run of the method. To ensure cost-efficiency of the class of 
EX{m) preconditioners for m > 1, the experimental black curve should fall below 
the theoretical red curve, which is not the case. Hence, the most simple case of the 
EX{\) or CSL preconditioner can again be considered optimal w.r.t. cost-efficiency. 
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Fig. 1.7 Convergence of £X(m)-BiCGStab and solutions of the discretized 2D Helmholtz model 
problem (1.46). Top: wavenumber k 2 = 5 x 10 3 and N = n x x n y = 128 x 128 unknowns. Bottom: 
wavenumber k 2 = 2 x 10 4 and N = 256 x 256 unknowns. V(l,l)-cycle approximate preconditioner 
inversion. Left: £X(m)-BiCGStab relative residual history ||fp||/||ro|| for various values of m. Ver¬ 
tical axis in log scale. Right: numerical solution u(x,y) to the equation (1.46) up to the relative 
residual tolerance tol = le—8. 


1.4.6 Problem setting: a 2D constant wavenumber Helmholtz 
problem with absorbing boundary conditions 

To conclude this work we extend the above ID model problem (1.43) to a two 
dimensional Helmholtz problem. Numerical results for solution using EX(m)- 
preconditioned BiCGStab and GMRES are provided, and we comment on the seal- 
ability of the expansion preconditioner functionality to higher spatial dimensions. 
Consider the two-dimensional constant wavenumber Helmholtz model problem 

(-A-k 2 )u(x,y)=f(x,y), (x,y) S Q = [0, l] 2 , 


(1.46) 
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n x x n y 

= 128 x 128 

n x x n y 

= 256 x 256 


k 2 

= 5e+3 

k 2 

= 2e+4 

m 

p(m) 

CPU time 

p(m) 

CPU time 

i 

37 

14.7 s. 

140 

157.0 s. 

2 

26 

19.0 s. 

112 

210.8 s. 

3 

22 

23.1 s. 

105 

277.9 s. 

4 

20 

26.3 s. 

104 

351.0 s. 

5 

18 

30.5 s. 

103 

436.2 s. 


EX(m)-preconditioned BiCGStab 


Table 1.3 Performance of £X(m)-BiCGStab for different values of m on the discretized 2D 
Helmholtz model problem (1.46). V(l,l)-cycle approximate preconditioner inversion. Column 1 
& 3: number of £X(m)-BiCGStab iterations p(m). Col. 2 & 4: total CPU time until convergence 
(in seconds). 


where the right-hand side f(x,y) again represents a unit source in the domain cen¬ 
ter and outgoing wave boundary conditions are implemented using Exterior Com¬ 
plex Scaling with Oecs = ft/6. We consider two different wavenumbers, namely 
k 2 = 5e+3 and k 2 = 2e+4, corresponding to a moderate- and high-energetic wave 
respectively. Equation (1.46) is discretized using n x = n y =128 (for k 2 = 5e+3) 
and n x = n y = 256 (for k 2 = 2e+4) real-valued grid points in each spatial di¬ 
mension, respecting the wavenumber criterion kh < 0.625 [4] in every direction. 
Note that the discretized 2D Helmholtz operator with ECS boundary conditions 
can be constructed from the ID Helmholtz operator using Kronecker products, i.e. 
A 2d =A\ D ®Iy+I x ®A y D , where I x G C' 1a ' x,!jc and I y £ C n y xn > are identity matrices. 

Figure 1.7 shows the EX (m)-BiCGStab solver convergence history for various 
values of m (left) and the solution u(x,y) (right) to the 2D model problem (1.46) 
for different wavenumbers and corresponding discretizations. The EX (m) precondi¬ 
tioner is approximately inverted using m multigrid V( 1,1 )-cycles with a weighted Ja- 
cobi smoother (weighting parameter 4/5). The corresponding number of BiCGStab 
iterations until convergence up to the relative residual tolerance tol= le—8 are dis¬ 
played in Table 1.3. The observations from the ID spectral analysis extend directly 
to the 2D setting, as the table shows that the use of the EX (in) preconditioner re¬ 
sults in a significant reduction of the number of outer Krylov iterations for growing 
values of m. 

Table 1.3 additionally features the CPU timings for the wavenumbers k 2 = 5e+3 
and 2e+4. Note that although the number of Krylov iterations is reduced as a func¬ 
tion of the preconditioner degree m, the CPU timings are rising in function of m. 
The computational cost of performing in multigrid V-cycles (compared to just one 
V-cycle for the CSL preconditioner) has a clear impact on the CPU timings. As a 
result, it is often advisable in view of cost-efficiency to restrict the expansion to the 
first term only (CSL preconditioner), where only one V-cycle is required to obtain 
an (approximate) preconditioner inverse. These observations are comparable to the 
ID results from Section 1.4.4. 

In Table 1.4 results for solving the same system using EX(m) -preconditioned 
GMRES are shown. Note that contrary to the BiCGStab results in Table 1.3, the 
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n x x n y 

= 128x 128 

n x x n y 

= 256 x 256 


k 2 

= 5e+3 

k 2 

= 2e+4 

m 

P(m) 

CPU time 

P( m ) 

CPU time 

i 

67 

19.0 s. 

233 

540.8 s. 

2 

50 

22.9 s. 

191 

477.3 s. 

3 

41 

26.8 s. 

175 

497.5 s. 

4 

37 

31.8 s. 

168 

547.8 s. 

5 

34 

35.9 s. 

165 

611.3 s. 


EX(m)-preconditioned GMRES 


Table 1.4 Performance of £X(m)-GMRES for different values of m on the discretized 2D 
Helmholtz model problem (1.46). V(l,l)-cycle approximate preconditioner inversion. Column 1 
& 3: number of £X(m)-GMRES iterations p(m). Col. 2 & 4: total CPU time until convergence (in 
seconds). 


EX (m) preconditioner does appear to be cost-efficient for the 2D Helmholtz prob¬ 
lem with wavenumber k 2 = 2e+4 for values of m> 1. Indeed, in this case the opti¬ 
mal preconditioner with respect to CPU time is the second-order polynomial EX (2), 
which reduces the number of outer Krylov iterations to 191 and minimizes the CPU 
time to 477.3 seconds, compared to 540.8 seconds for EX (l)-GMRES. The main 
cause for this phenomenon is the relatively high per-iteration computational cost of 
the GMRES algorithm, which is caused by the orthogonalization procedure with 
respect to the Krylov subspace basis vectors. This cost is especially pronounced for 
larger iteration numbers. Hence, the good approximation properties of the EX(m ) 
preconditioner for higher values of m may prove useful when the Krylov iteration 
cost is non-marginal compared to the cost of applying the preconditioner. The idea 
of polynomial preconditioners for GMRES has recently been proposed in the liter¬ 
ature, see e.g. [24], 


1.5 Conclusions 

In this work we have proposed a theoretical framework that generalizes the classic 
shifted Laplacian preconditioner by introducing the class of polynomial expansion 
preconditioners EX(m). This concept extends the one-term CSL preconditioner to 
an m-term Taylor polynomial in the inverse complex shifted Laplace operator. The 
outer iteration for solving the preconditioned system used in this work is a tradi¬ 
tional Krylov iteration such as BiCGStab or GMRES. 

Key properties of the EX (m) preconditioner class are its structure as a finite m- 
term series of powers of CSL inverses (Neumann series), and its resulting asymp¬ 
totic exactness, meaning EX(m) approaches A -1 in the limit for m going to infinity. 
The polynomial structure of the EX (, m ) preconditioner makes it easy to compute 
an iterative approximation to this polynomial, using e.g. m multigrid V(l,l)-cycles 
(one for each term), which are guaranteed to converge given that the complex shift 
is chosen to be sufficiently large. 
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The preconditioning efficiency of the EX (m) preconditioner is validated using 
a classic eigenvalue analysis. The addition of extra terms in the preconditioning 
polynomial clusters the spectrum around 1, which reduces the condition number of 
the preconditioned Helmholtz operator and suggests a significant reduction in the 
number of outer Krylov iterations for large m. 

Numerical results on ID and 2D Helmholtz benchmark problems support the 
theoretical results. The number of outer Krylov iterations is reduced significantly 
by the higher degree expansion preconditioners. Unfortunately, the computational 
cost of applying the EX(m) preconditioner is directly proportionate to the number 
of terms m, since an extra (approximate) CSL inversion is required for each addi¬ 
tional term in the polynomial. The use of a large number of series terms is thus not 
necessarily guaranteed to result in a more cost-efficient preconditioner. 

Following the numerical results of the ID and 2D experiments, these conclu¬ 
sions are expected to be directly generalizable to higher spatial dimensions. More¬ 
over, the constant wavenumber experiments performed in this paper are extensible to 
Helmholtz problems with heterogeneous and/or discontinuous wavenumbers, pro¬ 
vided that the complex shift variable in the EX (in) polynomial is large enough to 
allow for a stable numerical solution of the shifted Laplace systems for all wavenum¬ 
ber regimes occuring in the problem. Hence, from a practical preconditioning inter¬ 
est, the simple one-term shifted Laplace preconditioner appears to be the optimal 
member of the EX(m) class for many applications and problem configurations. 

Furthermore, two generalizations to the class of expansion preconditioners were 
presented and analyzed. These generalizations primarily prove to be insightful from 
a theoretical point of view. It is shown that the Taylor expansion preconditioner 
can be substituted by an optimal m-term polynomial which is theoretically cost- 
efficient. However, in practical applications the reduction of the number of outer 
Krylov iterations due to EX(m) preconditioning often does not pay off to the cost 
of the extra approximate multigrid inversions. 

For systems in which the cost of applying the outer Krylov step becomes signifi¬ 
cant relative to the cost of the preconditioner application, the use of multiple terms 
in the EX (m) expansion could result in a more cost-efficient solver. Possible sce¬ 
narios for this include the application of non-restarted GMRES as the outer Krylov 
solver, the use of higher order discretization schemes for the original Laplace oper¬ 
ator (while maintaining second order discretization for the preconditioner), etc. 

Additionally, the treatment of extremely large-scale HPC systems on massively 
parallel hardware may warrant the need for higher-order polynomial precondition¬ 
ers. Since Krylov methods are typically communication (or bandwidth) bound in¬ 
stead of compute bound in this context, polynomial preconditioning directly reduces 
the number of communication bottlenecks (dot-products) by reducing the number 
of Krylov iterations, while simultaneously improving the arithmetic intensity of the 
solver. A detailed analysis of these individual scenarios is however well beyond the 
scope of this text, and is left for future work. 

The generalization to the shifted Laplace preconditioner for Helmholtz problems 
proposed in this work is particularly valuable from a theoretical viewpoint, provid¬ 
ing fundamental insights into the concept of shifted Laplace preconditioning by sit- 
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uating the classic complex shifted Laplace operator in a broader theoretical context, 
and proving the classic CSL preconditioner to be the most cost-efficient member of 
the EX(m ) preconditioner class for the most common practical problems. 
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